title

Data Visualization Workshop

By Xuechun

Today's Topic:

  1. Data Visualization Tools

  2. Demo of creating maps using Python (folium, Plotly)

  3. How to embed visuals on your website

Some Data Visualization Tools

title

Some other free tools

  1. Openheatmap: http://www.openheatmap.com/
  2. Flourish: https://flourish.studio/
  3. D3.js Data-Driven : https://d3js.org (coding)

Crime data Visualization Using python

Python, Anaconda :

Packages used:
Install packages in python : https://packaging.python.org/tutorials/installing-packages/

pandas, geopandas

Data Source:

Online Courses:

Work Demos

In [1]:
## Import required packages
import pandas as pd
import geopandas as gpd
import folium
from folium import plugins
import branca
import time
start_time = time.time()
In [2]:
## Load geojson as geopandas dataframe
dc_zil_gdf = gpd.read_file("zillow-neighborhoods.geojson")    
dc_zil_gdf.head()
Out[2]:
city name regionid county state geometry
0 Washington Catholic University 273159 District of Columbia DC POLYGON ((-77.00433 38.94064, -77.00423 38.940...
1 Washington McLean Gardens 121759 District of Columbia DC POLYGON ((-77.07520 38.93977, -77.07475 38.938...
2 Washington Lincoln Heights 121751 District of Columbia DC POLYGON ((-76.92405 38.89835, -76.92303 38.898...
3 Washington Kenilworth 121743 District of Columbia DC POLYGON ((-76.93406 38.91220, -76.93426 38.911...
4 Washington Bellevue 121674 District of Columbia DC POLYGON ((-77.01639 38.80932, -77.01753 38.808...
In [6]:
dc_zil_gdf.shape
Out[6]:
(137, 6)
In [29]:
dc_crime_2019 = pd.read_csv('Crime_Incidents_in_2019.csv')
In [10]:
dc_crime_2019.columns
Out[10]:
Index(['X', 'Y', 'CCN', 'REPORT_DAT', 'SHIFT', 'METHOD', 'OFFENSE', 'BLOCK',
       'XBLOCK', 'YBLOCK', 'WARD', 'ANC', 'DISTRICT', 'PSA',
       'NEIGHBORHOOD_CLUSTER', 'BLOCK_GROUP', 'CENSUS_TRACT',
       'VOTING_PRECINCT', 'LATITUDE', 'LONGITUDE', 'BID', 'START_DATE',
       'END_DATE', 'OBJECTID', 'OCTO_RECORD_ID'],
      dtype='object')
In [32]:
dc_crime_2019.head()
Out[32]:
X Y CCN REPORT_DAT SHIFT METHOD OFFENSE BLOCK XBLOCK YBLOCK ... BLOCK_GROUP CENSUS_TRACT VOTING_PRECINCT LATITUDE LONGITUDE BID START_DATE END_DATE OBJECTID OCTO_RECORD_ID
0 -76.982944 38.887599 10199597 2019-11-07T11:41:36.000Z DAY OTHERS THEFT/OTHER 1500 - 1599 BLOCK OF INDEPENDENCE AVENUE SE 401480.0 135528.0 ... 006801 2 6801.0 Precinct 87 38.887592 -76.982941 NaN 2019-11-07T10:36:52.000Z 2019-11-07T11:42:02.000Z 429611163 10199597-01
1 -77.010378 38.820469 17084415 2019-01-28T00:00:00.000Z MIDNIGHT GUN HOMICIDE 130 - 199 BLOCK OF IRVINGTON STREET SW 399099.0 128076.0 ... 010900 2 10900.0 Precinct 126 38.820461 -77.010375 NaN 2017-05-19T22:58:53.000Z 2017-05-20T02:26:45.000Z 429841378 17084415-01
2 -76.952665 38.920544 18208996 2019-03-22T16:18:15.000Z EVENING OTHERS THEFT/OTHER 2400 BLOCK OF MARKET STREET NE 404105.0 139186.0 ... 009000 1 9000.0 Precinct 139 38.920536 -76.952663 NaN 2018-12-09T17:01:49.000Z 2018-12-09T18:49:21.000Z 429890611 18208996-01
3 -77.027565 38.897353 18221681 2019-01-01T10:24:06.000Z DAY OTHERS THEFT/OTHER 1100 - 1199 BLOCK OF F STREET NW 397609.0 136611.0 ... 005800 1 5800.0 Precinct 129 38.897346 -77.027563 DOWNTOWN 2018-12-31T11:49:19.000Z 2018-12-31T14:43:21.000Z 429890721 18221681-01
4 -77.021929 38.899129 18221708 2019-01-01T15:48:01.000Z EVENING OTHERS THEFT/OTHER 700 - 799 BLOCK OF 7TH STREET NW 398098.0 136808.0 ... 005800 1 5800.0 Precinct 129 38.899121 -77.021926 DOWNTOWN 2018-12-31T12:48:46.000Z 2018-12-31T12:51:47.000Z 429890728 18221708-01

5 rows × 25 columns

In [31]:
dc_crime_2019['NEIGHBORHOOD_CLUSTER']
Out[31]:
0        Cluster 26
1        Cluster 39
2        Cluster 24
3         Cluster 8
4         Cluster 8
            ...    
33905    Cluster 25
33906    Cluster 33
33907    Cluster 33
33908    Cluster 25
33909    Cluster 25
Name: NEIGHBORHOOD_CLUSTER, Length: 33910, dtype: object
In [18]:
dc_zil_gdf.columns
Out[18]:
Index(['city', 'name', 'regionid', 'county', 'state', 'geometry'], dtype='object')

Encode Neighborhood

In [3]:
import pyproj
from shapely.geometry import shape, Point
from shapely.ops import transform
from functools import partial
In [23]:
dc_crime_2019['neighborhood'] = ""
long = dc_crime_2019.columns.get_loc('LONGITUDE')
lat = dc_crime_2019.columns.get_loc('LATITUDE')
geometry = dc_zil_gdf.columns.get_loc('geometry')
name = dc_zil_gdf.columns.get_loc('name')

## use shapely to check if lat/lon is within the zillow neighborhood shape
for i in range(len(dc_crime_2019)):
    point = Point(dc_crime_2019.iloc[i,long],dc_crime_2019.iloc[i,lat]) ## Longitude, Latitude

    for j in range(len(dc_zil_gdf)):
        polygon = shape(dc_zil_gdf.iloc[j,geometry])
        if polygon.contains(point):
            dc_crime_2019.iloc[i, dc_crime_2019.columns.get_loc('neighborhood')] = dc_zil_gdf.iloc[j,name]
            
dc_crime_2019.to_csv("dc_crime_2019_final.csv", index = False) ## write the data so we don't have to re-run this every time
        
dc_crime_2019.head()
Out[23]:
X Y CCN REPORT_DAT SHIFT METHOD OFFENSE BLOCK XBLOCK YBLOCK ... CENSUS_TRACT VOTING_PRECINCT LATITUDE LONGITUDE BID START_DATE END_DATE OBJECTID OCTO_RECORD_ID neighborhood
0 -76.982944 38.887599 10199597 2019-11-07T11:41:36.000Z DAY OTHERS THEFT/OTHER 1500 - 1599 BLOCK OF INDEPENDENCE AVENUE SE 401480.0 135528.0 ... 6801.0 Precinct 87 38.887592 -76.982941 NaN 2019-11-07T10:36:52.000Z 2019-11-07T11:42:02.000Z 429611163 10199597-01 Kingman Park
1 -77.010378 38.820469 17084415 2019-01-28T00:00:00.000Z MIDNIGHT GUN HOMICIDE 130 - 199 BLOCK OF IRVINGTON STREET SW 399099.0 128076.0 ... 10900.0 Precinct 126 38.820461 -77.010375 NaN 2017-05-19T22:58:53.000Z 2017-05-20T02:26:45.000Z 429841378 17084415-01 Bellevue
2 -76.952665 38.920544 18208996 2019-03-22T16:18:15.000Z EVENING OTHERS THEFT/OTHER 2400 BLOCK OF MARKET STREET NE 404105.0 139186.0 ... 9000.0 Precinct 139 38.920536 -76.952663 NaN 2018-12-09T17:01:49.000Z 2018-12-09T18:49:21.000Z 429890611 18208996-01 Fort Lincoln
3 -77.027565 38.897353 18221681 2019-01-01T10:24:06.000Z DAY OTHERS THEFT/OTHER 1100 - 1199 BLOCK OF F STREET NW 397609.0 136611.0 ... 5800.0 Precinct 129 38.897346 -77.027563 DOWNTOWN 2018-12-31T11:49:19.000Z 2018-12-31T14:43:21.000Z 429890721 18221681-01 Penn Quarter
4 -77.021929 38.899129 18221708 2019-01-01T15:48:01.000Z EVENING OTHERS THEFT/OTHER 700 - 799 BLOCK OF 7TH STREET NW 398098.0 136808.0 ... 5800.0 Precinct 129 38.899121 -77.021926 DOWNTOWN 2018-12-31T12:48:46.000Z 2018-12-31T12:51:47.000Z 429890728 18221708-01 Chinatown

5 rows × 26 columns

In [4]:
dc_crime_2019 = pd.read_csv("dc_crime_2019_final.csv")
In [5]:
nhood_incidents_all = dc_crime_2019.neighborhood.value_counts()
nhood_map = dc_zil_gdf.merge(nhood_incidents_all.to_frame('Incidents_All'), left_on = 'name',right_index = True)
nhood_map.head()
Out[5]:
city name regionid county state geometry Incidents_All
0 Washington Catholic University 273159 District of Columbia DC POLYGON ((-77.00433 38.94064, -77.00423 38.940... 130
1 Washington McLean Gardens 121759 District of Columbia DC POLYGON ((-77.07520 38.93977, -77.07475 38.938... 19
2 Washington Lincoln Heights 121751 District of Columbia DC POLYGON ((-76.92405 38.89835, -76.92303 38.898... 66
3 Washington Kenilworth 121743 District of Columbia DC POLYGON ((-76.93406 38.91220, -76.93426 38.911... 47
4 Washington Bellevue 121674 District of Columbia DC POLYGON ((-77.01639 38.80932, -77.01753 38.808... 206
In [7]:
nhood_map['Incidents_All'].max()
import matplotlib.pyplot as plt
plt.hist(nhood_map['Incidents_All'], bins=10)
plt.show()
In [35]:
max(nhood_map['Incidents_All'])
Out[35]:
1892

Create a Crime Heatmap Using Python Folium

In [33]:
## identifies the center point of all the neighborhood shapes 
centroid=dc_zil_gdf.geometry.centroid 
## initiaes a map based on the centroid
m=folium.Map(location=[centroid.y.mean(), centroid.x.mean()], zoom_start=12) 
In [34]:
m
Out[34]:
In [14]:
nhood_map['QP'] = nhood_map['Incidents_All'] / nhood_map['Incidents_All'].sum()
nhood_map['QP_str'] = nhood_map['QP'].apply(lambda x : str(round(x*100, 1)) + '%')

name = "DC Crime Map"
leg_brks = [0, 50.0, 150.0, 250.0,500,750.0, 1000.0, 1892.0]
colorscale = branca.colormap.linear.YlOrRd_09.scale(nhood_map['Incidents_All'].min(), nhood_map['Incidents_All'].max()) 
colorscale = colorscale.to_step(n = 7, quantiles = leg_brks) ## sets quantile breaks 
colorscale.caption = name ## adds name for legend
colorscale
Out[14]:
2.01892.0
In [37]:
centroid=dc_zil_gdf.geometry.centroid 
## initiaes a map based on the centroid
m=folium.Map(location=[centroid.y.mean(), centroid.x.mean()], tiles="Stamen Toner", zoom_start=12) 
m
Out[37]:
In [38]:
# nhood_map['QP'] = nhood_map['Incidents_All'] / nhood_map['Incidents_All'].sum()
# nhood_map['QP_str'] = nhood_map['QP'].apply(lambda x : str(round(x*100, 1)) + '%')

# from branca.colormap import linear
# nbh_count_colormap = linear.YlGnBu_09.scale(min(nhood_map['Incidents_All']),
#                                             max(nhood_map['Incidents_All']))

## identifies the center point of all the neighborhood shapes 
centroid=dc_zil_gdf.geometry.centroid 
## initiaes a map based on the centroid
m=folium.Map(location=[centroid.y.mean(), centroid.x.mean()], tiles="Stamen Toner", zoom_start=12) 
style_function = lambda x: {"weight":1
                             , 'color': '#545453'
                             ## if variable is 0 map is a very light grey
                             ## else colorscale applies based on variable
                             , 'fillColor':'#9B9B9B' if x['properties']['Incidents_All'] == 0 
                             else colorscale(x['properties']['Incidents_All'])
                             ## similarly opacity is increased if value is 0
                             , 'fillOpacity': 0.2 if x['properties']['Incidents_All'] == 0 
                             else 0.7}


folium.GeoJson(
    nhood_map,
    style_function=style_function,
    tooltip=folium.GeoJsonTooltip(
        fields=['name', 'Incidents_All', 'QP_str'],
        aliases=['Neighbourhood', 'Incidents amount', 'Quote-part'],
        localize=True
    )
).add_to(m)

colorscale.add_to(m)
colorscale.caption = 'DC Crime Map 2019'
colorscale.add_to(m)
m
Out[38]:
In [40]:
nhood_map['QP'] = nhood_map['Incidents_All'] / nhood_map['Incidents_All'].sum()
nhood_map['QP_str'] = nhood_map['QP'].apply(lambda x : str(round(x*100, 1)) + '%')

from branca.colormap import linear
nbh_count_colormap = linear.YlGnBu_09.scale(min(nhood_map['Incidents_All']),
                                            max(nhood_map['Incidents_All']))
nbh_count_colormap
Out[40]:
21892
In [41]:
## identifies the center point of all the neighborhood shapes 
centroid=dc_zil_gdf.geometry.centroid 
## initiaes a map based on the centroid
m=folium.Map(location=[centroid.y.mean(), centroid.x.mean()], tiles="Stamen Toner", zoom_start=12) 
style_function = lambda x: {
    'fillColor': nbh_count_colormap(x['properties']['Incidents_All']),
    'color': 'black',
    'weight': 1.5,
    'fillOpacity': 0.7
}

folium.GeoJson(
    nhood_map,
    style_function=style_function,
    tooltip=folium.GeoJsonTooltip(
        fields=['name', 'Incidents_All', 'QP_str'],
        aliases=['Neighbourhood', 'Incidents amount', 'Quote-part'],
        localize=True
    )
).add_to(m)

nbh_count_colormap.add_to(m)
nbh_count_colormap.caption = 'DC Crime Map 2019'
nbh_count_colormap.add_to(m)
m
Out[41]:

Create a Crime map Using Python Plotly

In [107]:
dc_crime_2019.head()
Out[107]:
X Y CCN REPORT_DAT SHIFT METHOD OFFENSE BLOCK XBLOCK YBLOCK ... CENSUS_TRACT VOTING_PRECINCT LATITUDE LONGITUDE BID START_DATE END_DATE OBJECTID OCTO_RECORD_ID neighborhood
0 -76.982944 38.887599 10199597 2019-11-07T11:41:36.000Z DAY OTHERS THEFT/OTHER 1500 - 1599 BLOCK OF INDEPENDENCE AVENUE SE 401480.0 135528.0 ... 6801.0 Precinct 87 38.887592 -76.982941 NaN 2019-11-07T10:36:52.000Z 2019-11-07T11:42:02.000Z 429611163 10199597-01 Kingman Park
1 -77.010378 38.820469 17084415 2019-01-28T00:00:00.000Z MIDNIGHT GUN HOMICIDE 130 - 199 BLOCK OF IRVINGTON STREET SW 399099.0 128076.0 ... 10900.0 Precinct 126 38.820461 -77.010375 NaN 2017-05-19T22:58:53.000Z 2017-05-20T02:26:45.000Z 429841378 17084415-01 Bellevue
2 -76.952665 38.920544 18208996 2019-03-22T16:18:15.000Z EVENING OTHERS THEFT/OTHER 2400 BLOCK OF MARKET STREET NE 404105.0 139186.0 ... 9000.0 Precinct 139 38.920536 -76.952663 NaN 2018-12-09T17:01:49.000Z 2018-12-09T18:49:21.000Z 429890611 18208996-01 Fort Lincoln
3 -77.027565 38.897353 18221681 2019-01-01T10:24:06.000Z DAY OTHERS THEFT/OTHER 1100 - 1199 BLOCK OF F STREET NW 397609.0 136611.0 ... 5800.0 Precinct 129 38.897346 -77.027563 DOWNTOWN 2018-12-31T11:49:19.000Z 2018-12-31T14:43:21.000Z 429890721 18221681-01 Penn Quarter
4 -77.021929 38.899129 18221708 2019-01-01T15:48:01.000Z EVENING OTHERS THEFT/OTHER 700 - 799 BLOCK OF 7TH STREET NW 398098.0 136808.0 ... 5800.0 Precinct 129 38.899121 -77.021926 DOWNTOWN 2018-12-31T12:48:46.000Z 2018-12-31T12:51:47.000Z 429890728 18221708-01 Chinatown

5 rows × 26 columns

In [19]:
ave_lat = sum(dc_crime_2019.Y)/len(dc_crime_2019.Y)
ave_long = sum(dc_crime_2019.X)/len(dc_crime_2019.X)
In [20]:
ave_lat
Out[20]:
38.908362901861594
In [21]:
import plotly
import plotly.graph_objs as go
from plotly.tools import make_subplots

# Generate an access token for this project 
mapbox_access_token = 'pk.eyJ1Ijoid2FuZzY1MDYiLCJhIjoiY2tiNGtra2ozMHVoYjJ3bzlsMThtenNyOCJ9.jPlsp6JCn_Vu_GzykjHtnw'
my_style = "mapbox://styles/wang6506/ckb4kl37t0q271jmpxc8akwsg"

trace = go.Scattermapbox(
  lat = dc_crime_2019['Y'],
  lon = dc_crime_2019['X'],
  marker = go.scattermapbox.Marker(size = 5,opacity = 0.7),
  text = dc_crime_2019[['BLOCK','OFFENSE']]
)

layout = go.Layout(
     title = 'DC Crime Visual',
     width = 1000, height = 1000,
     mapbox = go.layout.Mapbox(
       accesstoken = mapbox_access_token,
       bearing = -50,
       pitch  = 50,
       zoom = 12,
       center = go.layout.mapbox.Center(lat=ave_lat,lon=ave_long),
       style = my_style
     ),
 )
fig = go.Figure(data = trace, layout = layout)
plotly.offline.iplot(fig)